Overview

Dataset statistics

Number of variables12
Number of observations1143
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory107.3 KiB
Average record size in memory96.1 B

Variable types

Numeric9
Categorical3

Warnings

ad_id is highly correlated with campaign_idHigh correlation
campaign_id is highly correlated with ad_idHigh correlation
interest1 is highly correlated with interest2 and 1 other fieldsHigh correlation
interest2 is highly correlated with interest1 and 1 other fieldsHigh correlation
interest3 is highly correlated with interest1 and 1 other fieldsHigh correlation
impressions is highly correlated with clicks and 3 other fieldsHigh correlation
clicks is highly correlated with impressions and 3 other fieldsHigh correlation
spent is highly correlated with impressions and 3 other fieldsHigh correlation
total_conversion is highly correlated with impressions and 3 other fieldsHigh correlation
approved_conversion is highly correlated with impressions and 3 other fieldsHigh correlation
ad_id is highly correlated with impressions and 3 other fieldsHigh correlation
interest1 is highly correlated with interest2 and 1 other fieldsHigh correlation
interest2 is highly correlated with interest1 and 1 other fieldsHigh correlation
interest3 is highly correlated with interest1 and 1 other fieldsHigh correlation
impressions is highly correlated with ad_id and 3 other fieldsHigh correlation
clicks is highly correlated with ad_id and 3 other fieldsHigh correlation
spent is highly correlated with ad_id and 3 other fieldsHigh correlation
total_conversion is highly correlated with ad_id and 4 other fieldsHigh correlation
approved_conversion is highly correlated with total_conversionHigh correlation
interest1 is highly correlated with interest2 and 1 other fieldsHigh correlation
interest2 is highly correlated with interest1 and 1 other fieldsHigh correlation
interest3 is highly correlated with interest1 and 1 other fieldsHigh correlation
impressions is highly correlated with clicks and 2 other fieldsHigh correlation
clicks is highly correlated with impressions and 2 other fieldsHigh correlation
spent is highly correlated with impressions and 2 other fieldsHigh correlation
total_conversion is highly correlated with impressions and 3 other fieldsHigh correlation
approved_conversion is highly correlated with total_conversionHigh correlation
total_conversion is highly correlated with clicks and 3 other fieldsHigh correlation
clicks is highly correlated with total_conversion and 3 other fieldsHigh correlation
campaign_id is highly correlated with ad_idHigh correlation
approved_conversion is highly correlated with total_conversion and 3 other fieldsHigh correlation
ad_id is highly correlated with campaign_id and 3 other fieldsHigh correlation
impressions is highly correlated with total_conversion and 3 other fieldsHigh correlation
interest2 is highly correlated with ad_id and 2 other fieldsHigh correlation
spent is highly correlated with total_conversion and 3 other fieldsHigh correlation
interest3 is highly correlated with ad_id and 2 other fieldsHigh correlation
interest1 is highly correlated with ad_id and 2 other fieldsHigh correlation
gender is highly correlated with campaign_idHigh correlation
campaign_id is highly correlated with genderHigh correlation
ad_id has unique values Unique
clicks has 207 (18.1%) zeros Zeros
spent has 207 (18.1%) zeros Zeros
approved_conversion has 559 (48.9%) zeros Zeros

Reproduction

Analysis started2021-08-15 12:19:07.090064
Analysis finished2021-08-15 12:19:24.426319
Duration17.34 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

ad_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct1143
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean987261.1304
Minimum708746
Maximum1314415
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum708746
5-th percentile734245.3
Q1777632.5
median1121185
Q31121804.5
95-th percentile1314348.9
Maximum1314415
Range605669
Interquartile range (IQR)344172

Descriptive statistics

Standard deviation193992.8147
Coefficient of variation (CV)0.196495951
Kurtosis-1.412358983
Mean987261.1304
Median Absolute Deviation (MAD)169983
Skewness-0.1027848097
Sum1128439472
Variance3.763321217 × 1010
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11223031
 
0.1%
11211411
 
0.1%
11211311
 
0.1%
11211321
 
0.1%
11211331
 
0.1%
11211341
 
0.1%
11211361
 
0.1%
11211381
 
0.1%
11211421
 
0.1%
11211291
 
0.1%
Other values (1133)1133
99.1%
ValueCountFrequency (%)
7087461
0.1%
7087491
0.1%
7087711
0.1%
7088151
0.1%
7088181
0.1%
7088201
0.1%
7088891
0.1%
7088951
0.1%
7089531
0.1%
7089581
0.1%
ValueCountFrequency (%)
13144151
0.1%
13144141
0.1%
13144121
0.1%
13144111
0.1%
13144101
0.1%
13144091
0.1%
13144081
0.1%
13144071
0.1%
13144061
0.1%
13144051
0.1%

campaign_id
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size9.1 KiB
936
740 
1178
349 
916
 
54

Length

Max length4
Median length3
Mean length3.305336833
Min length3

Characters and Unicode

Total characters3778
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row916
2nd row916
3rd row916
4th row916
5th row916

Common Values

ValueCountFrequency (%)
936740
64.7%
1178349
30.5%
91654
 
4.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
936740
64.7%
1178349
30.5%
91654
 
4.7%

Most occurring characters

ValueCountFrequency (%)
9794
21.0%
6794
21.0%
1752
19.9%
3740
19.6%
7349
9.2%
8349
9.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3778
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9794
21.0%
6794
21.0%
1752
19.9%
3740
19.6%
7349
9.2%
8349
9.2%

Most occurring scripts

ValueCountFrequency (%)
Common3778
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9794
21.0%
6794
21.0%
1752
19.9%
3740
19.6%
7349
9.2%
8349
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3778
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9794
21.0%
6794
21.0%
1752
19.9%
3740
19.6%
7349
9.2%
8349
9.2%

age
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size9.1 KiB
30-34
426 
45-49
259 
35-39
248 
40-44
210 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters5715
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30-34
2nd row30-34
3rd row30-34
4th row30-34
5th row30-34

Common Values

ValueCountFrequency (%)
30-34426
37.3%
45-49259
22.7%
35-39248
21.7%
40-44210
18.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
30-34426
37.3%
45-49259
22.7%
35-39248
21.7%
40-44210
18.4%

Most occurring characters

ValueCountFrequency (%)
41574
27.5%
31348
23.6%
-1143
20.0%
0636
11.1%
5507
 
8.9%
9507
 
8.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4572
80.0%
Dash Punctuation1143
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
41574
34.4%
31348
29.5%
0636
13.9%
5507
 
11.1%
9507
 
11.1%
Dash Punctuation
ValueCountFrequency (%)
-1143
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5715
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
41574
27.5%
31348
23.6%
-1143
20.0%
0636
11.1%
5507
 
8.9%
9507
 
8.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII5715
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
41574
27.5%
31348
23.6%
-1143
20.0%
0636
11.1%
5507
 
8.9%
9507
 
8.9%

gender
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size9.1 KiB
M
592 
F
551 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1143
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M592
51.8%
F551
48.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m592
51.8%
f551
48.2%

Most occurring characters

ValueCountFrequency (%)
M592
51.8%
F551
48.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1143
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M592
51.8%
F551
48.2%

Most occurring scripts

ValueCountFrequency (%)
Latin1143
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M592
51.8%
F551
48.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1143
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M592
51.8%
F551
48.2%

interest1
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.7664042
Minimum2
Maximum114
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum2
5-th percentile10
Q116
median25
Q331
95-th percentile105.9
Maximum114
Range112
Interquartile range (IQR)15

Descriptive statistics

Standard deviation26.95213098
Coefficient of variation (CV)0.8225538211
Kurtosis2.226426216
Mean32.7664042
Median Absolute Deviation (MAD)7
Skewness1.766272906
Sum37452
Variance726.4173642
MonotonicityNot monotonic
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
16140
 
12.2%
1085
 
7.4%
2977
 
6.7%
2760
 
5.2%
1551
 
4.5%
2851
 
4.5%
2049
 
4.3%
6448
 
4.2%
6346
 
4.0%
1843
 
3.8%
Other values (30)493
43.1%
ValueCountFrequency (%)
225
 
2.2%
724
 
2.1%
1085
7.4%
1551
 
4.5%
16140
12.2%
1843
 
3.8%
1932
 
2.8%
2049
 
4.3%
2136
 
3.1%
2233
 
2.9%
ValueCountFrequency (%)
1145
0.4%
1136
0.5%
1127
0.6%
1116
0.5%
1108
0.7%
1096
0.5%
1087
0.6%
1078
0.7%
1065
0.4%
1057
0.6%

interest2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct66
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.24584427
Minimum3
Maximum118
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum3
5-th percentile12
Q120
median28
Q335
95-th percentile109
Maximum118
Range115
Interquartile range (IQR)15

Descriptive statistics

Standard deviation26.93785281
Coefficient of variation (CV)0.7431983818
Kurtosis2.183842555
Mean36.24584427
Median Absolute Deviation (MAD)7
Skewness1.749598895
Sum41429
Variance725.647914
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3253
 
4.6%
2150
 
4.4%
3149
 
4.3%
2244
 
3.8%
3043
 
3.8%
3341
 
3.6%
2440
 
3.5%
1739
 
3.4%
1939
 
3.4%
2939
 
3.4%
Other values (56)706
61.8%
ValueCountFrequency (%)
32
 
0.2%
47
0.6%
53
 
0.3%
65
 
0.4%
73
 
0.3%
88
0.7%
97
0.6%
108
0.7%
1112
1.0%
1213
1.1%
ValueCountFrequency (%)
1181
 
0.1%
1174
 
0.3%
1168
0.7%
1155
0.4%
1148
0.7%
11310
0.9%
1126
0.5%
1117
0.6%
1105
0.4%
1095
0.4%

interest3
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct69
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.22222222
Minimum3
Maximum120
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum3
5-th percentile12
Q120
median28
Q335
95-th percentile108
Maximum120
Range117
Interquartile range (IQR)15

Descriptive statistics

Standard deviation26.92467855
Coefficient of variation (CV)0.7433193464
Kurtosis2.19218859
Mean36.22222222
Median Absolute Deviation (MAD)7
Skewness1.753171733
Sum41402
Variance724.9383148
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2251
 
4.5%
3250
 
4.4%
2145
 
3.9%
3045
 
3.9%
2044
 
3.8%
3142
 
3.7%
2441
 
3.6%
3341
 
3.6%
2539
 
3.4%
2939
 
3.4%
Other values (59)706
61.8%
ValueCountFrequency (%)
34
 
0.3%
43
 
0.3%
53
 
0.3%
63
 
0.3%
79
0.8%
87
0.6%
95
 
0.4%
104
 
0.3%
1117
1.5%
1213
1.1%
ValueCountFrequency (%)
1201
 
0.1%
1191
 
0.1%
1183
 
0.3%
1172
 
0.2%
1166
0.5%
1156
0.5%
1147
0.6%
1135
0.4%
1129
0.8%
11110
0.9%

impressions
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1130
Distinct (%)98.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186732.133
Minimum87
Maximum3052003
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum87
5-th percentile741.2
Q16503.5
median51509
Q3221769
95-th percentile894449.4
Maximum3052003
Range3051916
Interquartile range (IQR)215265.5

Descriptive statistics

Standard deviation312762.1832
Coefficient of variation (CV)1.674924279
Kurtosis13.12408972
Mean186732.133
Median Absolute Deviation (MAD)49955
Skewness3.010185014
Sum213434828
Variance9.782018325 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38122
 
0.2%
5292
 
0.2%
24792
 
0.2%
15392
 
0.2%
28792
 
0.2%
27552
 
0.2%
10302
 
0.2%
20772
 
0.2%
111992
 
0.2%
42592
 
0.2%
Other values (1120)1123
98.3%
ValueCountFrequency (%)
871
0.1%
1522
0.2%
1991
0.1%
2191
0.1%
2391
0.1%
2461
0.1%
2551
0.1%
2591
0.1%
2921
0.1%
3431
0.1%
ValueCountFrequency (%)
30520031
0.1%
22862281
0.1%
22232781
0.1%
20806661
0.1%
17052461
0.1%
16634411
0.1%
14477551
0.1%
14284211
0.1%
13922881
0.1%
13919241
0.1%

clicks
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct183
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.39020122
Minimum0
Maximum421
Zeros207
Zeros (%)18.1%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q337.5
95-th percentile161.9
Maximum421
Range421
Interquartile range (IQR)36.5

Descriptive statistics

Standard deviation56.8924383
Coefficient of variation (CV)1.703866291
Kurtosis8.539645174
Mean33.39020122
Median Absolute Deviation (MAD)8
Skewness2.71218742
Sum38165
Variance3236.749536
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0207
 
18.1%
1119
 
10.4%
270
 
6.1%
351
 
4.5%
434
 
3.0%
534
 
3.0%
729
 
2.5%
618
 
1.6%
917
 
1.5%
2016
 
1.4%
Other values (173)548
47.9%
ValueCountFrequency (%)
0207
18.1%
1119
10.4%
270
 
6.1%
351
 
4.5%
434
 
3.0%
534
 
3.0%
618
 
1.6%
729
 
2.5%
814
 
1.2%
917
 
1.5%
ValueCountFrequency (%)
4211
0.1%
3671
0.1%
3531
0.1%
3461
0.1%
3401
0.1%
2951
0.1%
2821
0.1%
2761
0.1%
2721
0.1%
2681
0.1%

spent
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct869
Distinct (%)76.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.36065613
Minimum0
Maximum639.9499981
Zeros207
Zeros (%)18.1%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11.480000019
median12.36999989
Q360.02499992
95-th percentile248.5080003
Maximum639.9499981
Range639.9499981
Interquartile range (IQR)58.5449999

Descriptive statistics

Standard deviation86.90841794
Coefficient of variation (CV)1.692120477
Kurtosis8.843981056
Mean51.36065613
Median Absolute Deviation (MAD)12.36999989
Skewness2.70886702
Sum58705.22996
Variance7553.073108
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0207
 
18.1%
1.3799999955
 
0.4%
1.2899999624
 
0.3%
1.5399999624
 
0.3%
1.2300000193
 
0.3%
1.3200000523
 
0.3%
1.5800000433
 
0.3%
1.5900000333
 
0.3%
1.5700000523
 
0.3%
1.3700000053
 
0.3%
Other values (859)905
79.2%
ValueCountFrequency (%)
0207
18.1%
0.1800000071
 
0.1%
0.2399999951
 
0.1%
0.4099999961
 
0.1%
0.490000011
 
0.1%
0.5299999711
 
0.1%
0.5400000211
 
0.1%
0.5699999932
 
0.2%
0.6000000241
 
0.1%
0.7200000291
 
0.1%
ValueCountFrequency (%)
639.94999811
0.1%
612.30000321
0.1%
603.3800021
0.1%
541.70000231
0.1%
465.07999811
0.1%
429.47999811
0.1%
422.84000381
0.1%
420.57999831
0.1%
409.56000261
0.1%
402.30000261
0.1%

total_conversion
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct32
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.855643045
Minimum0
Maximum60
Zeros8
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q33
95-th percentile11
Maximum60
Range60
Interquartile range (IQR)2

Descriptive statistics

Standard deviation4.483593472
Coefficient of variation (CV)1.570081905
Kurtosis38.58919567
Mean2.855643045
Median Absolute Deviation (MAD)0
Skewness5.095918881
Sum3264
Variance20.10261042
MonotonicityNot monotonic
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
1666
58.3%
2162
 
14.2%
378
 
6.8%
461
 
5.3%
541
 
3.6%
622
 
1.9%
716
 
1.4%
1113
 
1.1%
813
 
1.1%
139
 
0.8%
Other values (22)62
 
5.4%
ValueCountFrequency (%)
08
 
0.7%
1666
58.3%
2162
 
14.2%
378
 
6.8%
461
 
5.3%
541
 
3.6%
622
 
1.9%
716
 
1.4%
813
 
1.1%
97
 
0.6%
ValueCountFrequency (%)
601
 
0.1%
401
 
0.1%
381
 
0.1%
312
0.2%
301
 
0.1%
281
 
0.1%
262
0.2%
241
 
0.1%
233
0.3%
224
0.3%

approved_conversion
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9440069991
Minimum0
Maximum21
Zeros559
Zeros (%)48.9%
Negative0
Negative (%)0.0%
Memory size9.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile4
Maximum21
Range21
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.737708006
Coefficient of variation (CV)1.840778731
Kurtosis34.59340334
Mean0.9440069991
Median Absolute Deviation (MAD)1
Skewness4.837539423
Sum1079
Variance3.019629114
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0559
48.9%
1403
35.3%
286
 
7.5%
335
 
3.1%
424
 
2.1%
67
 
0.6%
57
 
0.6%
85
 
0.4%
104
 
0.3%
74
 
0.3%
Other values (6)9
 
0.8%
ValueCountFrequency (%)
0559
48.9%
1403
35.3%
286
 
7.5%
335
 
3.1%
424
 
2.1%
57
 
0.6%
67
 
0.6%
74
 
0.3%
85
 
0.4%
93
 
0.3%
ValueCountFrequency (%)
211
 
0.1%
171
 
0.1%
142
 
0.2%
131
 
0.1%
121
 
0.1%
104
0.3%
93
0.3%
85
0.4%
74
0.3%
67
0.6%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

ad_idcampaign_idagegenderinterest1interest2interest3impressionsclicksspenttotal_conversionapproved_conversion
070874691630-34M151717735011.4321
170874991630-34M1619211786121.8220
270877191630-34M20252269300.0010
370881591630-34M283232425911.2510
470881891630-34M283332413311.2911
570882091630-34M293030191500.0011
670888991630-34M1516171561534.7710
770889591630-34M1620181095111.2711
870895391630-34M273131235511.5010
970895891630-34M283231950233.1610

Last rows

ad_idcampaign_idagegenderinterest1interest2interest3impressionsclicksspenttotal_conversionapproved_conversion
1133131440593645-49F104107110558666110162.639998145
1134131440693645-49F1051061091118200235333.749994114
1135131440793645-49F1061121081071002333.71000110
1136131440893645-49F107113112877769160232.590001134
1137131440993645-49F1081121122125083347.69000041
1138131441093645-49F1091111141129773252358.189997132
1139131441193645-49F110111116637549120173.88000330
1140131441293645-49F1111131171515312840.28999920
1141131441493645-49F113114117790253135198.71000082
1142131441593645-49F114116118513161114165.60999952